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Abstract. Traditional control theory is well-developed mainly for linear control situations. In 
non-linear cases there is no general method of generating a good control, so we have to rely on the 
ability of the experts (operators) to control them. If we want to automate their control, we must 
acquire their knowledge and translate it into a precise control strategy. 

The experts’ knowledge is usually represented in non-numeric terms, namely, in terms of 
uncertain statements of the type “if the obstacle is straight ahead, the distance to it is small, and 
the velocity of the car is medium, press the brakes hard”. Fuzzy control is a methodology that 
translates such statements into precise formulas for control. The necessary first step of this strategy 
consists of assigning membership functions to all the terms that the expert uses in his rules (in our 
sample phrase these words are “small”, “medium”, and “hard”). 

The appropriate choice of a membership function can drastically improve the quality of a 
fuzzy control. In the simplest cases, we can take the functions whose domains have equally spaced 
endpoints. Because of that, many software packages for fuzzy control are based on this choice 
of membership functions. This choice is not very efficient in more complicated cases. Therefore, 
methods have been developed that use neural networks or genetic algorithms to “tune” membership 
functions. But this tuning takes lots of time (for example, several thousands iterations are typical 
for neural networks). 

In some cases there are evident physical reasons why equally spaced domains do not work: 
e.g., if the control variable u is always positive (i.e., if we control temperature in a reactor), then 
negative values (that are generated by equal spacing) simply make no sense. In this case it sounds 
reasonable to choose another scale u' — f(u ) to represent u, so that equal spacing will work fine 
for u' . 

In the present paper we formulate the problem of finding the best rescaling function , solve 
this problem, and show (on a real-life example ) that after an optimal rescaling, the un-tuned fuzzy 
control can be as good as the best state-of-art traditional non-linear controls. 

1. INTRODUCTION TO THE PROBLEM 

Traditional control theory is not always applicable, so we have to use fuzzy control. 

Traditional control theory is well- developed mainly for linear control situations. In non-linear cases, 
although for many cases there are good recipes, there is still no general method of generating a 
good control (see, e.g., [M91]). 

Therefore, we have to rely on the ability of the experts (operators) to control these systems. 
If we want to automate their control, we must acquire transform their knowledge il into a precise 
control strategy. 

The experts’ knowledge is usually represented in non-numeric terms, namely, in terms of 
uncertain statements of the type “if the obstacle is straight ahead, the distance to it is small, and 
the velocity of the car is medium, press the brakes hard”. Fuzzy control is a methodology that 
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translates such statements into precise formulas for control. Fuzzy control was started by L. Zadeh 
and E. H. Mamdani [Z71], [CZ72], [Z73], [M74] in the framework of fuzzy set theory [Z65]. For the 
current state of fuzzy control the reader is referred to the surveys [S85], [L90] and [B91]. 

Choice of membership functions: an important first step of fuzzy control methodology. 

The necessary first step of this methodology consists of assigning membership functions to all the 
terms that the expert uses in his rules (in our sample phrase these words are “small”, “medium”, 
and “hard”). The appropriate choice of a membership function can drastically improve the quality 
of a fuzzy control. 

Simplest case: equally spaced functions. In the simplest cases, we can take the functions 
whose domains have equally spaced endpoints: e.g., we can fix a neutral value N (usually, N = 0), 
and a number A, and take “negligible” with the domain [N — A, N + A], “small positive” with the 
domain [ N , N + 2A], “medium positive” with the domain [N + A, iV + 3A], etc. Correspondingly, 
“small negative has the domain [iV — 2 A, A], “medium negative corresponds to the domain [N - 
3A, N — A], etc. If an interval [a — A, a + A] is given, then we can take a membership function /x(x) 
that is equal to 0 outside this interval, equal to 1 for x = a, and is linear on the intervals [a — A] 
and [a, a + A]. Many software packages for fuzzy control are based on this choice of membership 
functions. 

What is usually done in more complicated cases. This choice of equally spaced functions is 
not always very efficient in more complicated cases. Therefore, methods have been developed that 
use neural networks or genetic algorithms to “tune” membership functions (see, e.g., numerous 
papers in [RSW92]). But this tuning takes lots of time (for example, several thousands iterations 
are typical for neural networks). 

The idea of a rescaling. In some cases there are evident physical reasons why equally spaced 
domains do not work. For example, if the control variable u is always positive (i.e., if we control 
the flow of some substance into a reactor), then negative values (that will be eventually generated 
by an equal spacing method) simply make no sense. 

A natural idea is to choose another scale u 1 = f(u) to represent the control variable u, so 
that equal spacing will work fine for v! . This idea is in good accordance with our common-sense 
description of physical processes. For example, from the physical viewpoint it is quite possible to 
describe the strength of an earthquake by its energy, but, when we talk about its consequences, 
it is much more convenient to use a logarithmic scale (called Richter scale). Non-linear scales are 
used to describe amplifiers and noise (decibels, in electrical engineering), to describe hardness of 
different minerals in geosciences, etc. (for a general survey of different scales and rescalings see 
[SKLT71, 89]). 

In our case we want to design such a scale that for f(u) the equally spaced endpoints N — kA 
and N + kA would make sense for all integers k . Therefore, we are looking for a function /(u), 
whose domain is the set of all positive values, and whose range is all possible real numbers. In 
mathematical notations, / must map (0,oo) onto ( — 00 , 00 ). There are lots of such functions, and 
evidently not all of them will improve the control. So we arrive at the following problem: 

The main problem. What rescaling to choose? 

What we are planning to do. We formulate the problem of choosing the best rescaling function 
f(u) as a mathematical optimization problem, and then we solve this problem under some reason- 
able optimality criteria. As a result, we get an optimal function f(u). We show that its application 
to non-linear systems really improves fuzzy control. 
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2. MOTIVATIONS OF THE PROPOSED MATHEMATICAL DEFINITIONS 

Why ia this problem difficult? We want to find a scaling function /( u) that is the best in some 
reasonable sense, that is, for which some characteristic I attains the value that corresponds to the 
best performance of the resulting fuzzy control. As examples of such characteristics, we can take 
an average running time of an algorithm, or some characteristics of smoothness or stability of the 
resulting control, etc. The problem is that even for the simplest linear plants (controlled systems), 
w ®. not k now how to compute any of these possible characteristics. How can we find /(«) for 

which /(/(«)) is optimal if we cannot compute /(/(u)) even for a single function /(«)? There does 
not seem to be a likely answer. 

However, we will show that this problem is solvable (and give the solution). 

The basic idea for solving these kind of problems is described in [K90]; for its application to 
fuzzy logic see [KK90], to neural networks see [KQ91], to genetic algorithms see [KQF921, and to 
different problems of fuzzy control see [KQLFLKBR92]. 

We must choose a family of functions, not a single function. Suppose that for some physical 
quantity u (e.g., for x coordinate) equal spacing leads to a reasonably good control strategy. 

In order to get numerical values of x coordinate, we must fix some starting point and some 
measuring unit (e.g., a meter). In principle we could as well choose feet to describe length. Then 
the numerical values of all the coordinates will be different (x meters are equal to Ax feet, where 
A is the number of feet in 1 meter). However, the intervals that were equally spaced when we used 
one unit, are still equally spaced, if we use another unit to measure this coordinate. 

In a similar way, we could choose a different starting point for the x coordinate. If we take as 
a starting point a point that had a coordinate x 0 (so that now its coordinate is 0), then all other 
coordinates will be shifted: x — x - x 0 . Again intervals that were equal in the old scale (x) will 
still be equal if we measure then in the new scale (x - x 0 ). 

We can also change both the measuring unit and the starting point. This way we arrive at a 
transformation x — ► Ax + xo . 

Summarizing: if x is a reasonable scale, in the sense that equally spaced membership functions 
lead to a reasonably good control, then the same is true for any scale Ax + x 0 , where A > 0, 

n^ d A°A S The reaS ° n is that if we have a se 9 «ence of equally spaced intervals’ 

l^v + fcA, TV + (A: + 1)AJ, then these intervals will remain equally spaces after these linear rescalings 
x — Ax + x 0 : namely, these intervals will turn into intervals [N' + kA',N' + (k + l)A'l where 
N 1 - XN + x 0 and A' = AA. J ’ 

Let us now consider a scale u, for which equal spacing does not work. Assume that u -* /(«) 
is a transformation, after which equal spacing becomes applicable. This means that if we use f(u) 
as a new scale, then equal spacings work fine. But as we have just shown, for any A > 0 and x 0 
equal spacing will also work fine for the scale \f(u) + x 0 . 

Therefore, if f(u) is a function that transforms the initial scale into a scale, for which equal 
spacing works fine, then for every A > 0 and x 0 the function /'(«) = A/(«) + x 0 has the same 
desired property. 

This means that there is no way to pick one function /(u), because with any function f(u), 
the whole family of functions A/(u) + x 0 has the same property. Therefore, desired functions form 


176 



a family {A/(«) + xo}a>o,* 0 - Hence, instead of choosing a single function, we must formulate a 
problem of choosing a family. 

Which family is the best? Among all such families, we want to choose the best one. In 
formalizing what “the best” means, we follow the general idea outlined in [K90] and applied to 
neural networks in [KQ91]. The criteria to choose may be computational simplicity, stability or 
smoothness of the resulting control, etc. In mathematical optimization problems, numeric criteria 
are most frequently used, where to every family we assign some value expressing its performance, 
and choose a family for which this value is maximal. However, it is not necessary to restrict 
ourselves to such numeric criteria only. For example, if we have several different families that lead 
to the same average stability characteristics T, we can choose between them the one that leads to 
the maximal smoothness characteristics P. In this case, the actual criterion that we use to compare 
two families is not numeric, but more complicated: a family $ x is better than the family $ 2 if and 
only if either T ($0 < T($ 2 ), or T($ x ) = T($ 2 ) and f>($ x ) < P($ 2 ). A criterion can be even 
more complicated. What a criterion must do is to allow us for every pair of families to tell whether 
the first family is better with respect to this criterion (we’ll denote it by $ 2 < $ x ), or the second is 
better ($ x < $ 2 ) or these families have the same quality in the sense of this criterion (we’ll denote 
it by $1 ~ $ 2 ). 

The criterion for choosing the best family must be consistent. Of course, it is necessary 
to demand that these choices be consistent, e.g., if $ x < $2 and <I > 2 < $3 then $ x < $ 3 . 

The criterion must be final. Another natural demand is that this criterion must be final in the 
sense that it must choose a unique optimal family (i.e., a family that is better with respect to this 
criterion than any other family). 

The reason for this demand is very simple. If a criterion does not choose any family at all, then 
it is of no use. If several different families are “the best” according to this criterion, then we still 
have a problem choosing the absolute “best” family. Therefore, we need some additional criterion 
for that choice. For example, if several families turn out to have the same stability characteristics, 
we can choose among them a family with maximal smoothness. So what we actually do in this 
case is abandon that criterion for which there were several “best” families, and consider a new 
“composite” criterion instead: $ x is better than $ 2 according to this new criterion if either it was 
better according to the old criterion, or according to the old criterion they had the same quality, 
and $ x is better than $ 2 according to the additional criterion. In other words, if a criterion does 
not allow us to choose a unique best family, it means that this criterion is not ultimate; we have 
to modify it until we come to a final criterion that will have that property. 

The criterion must be reasonably invariant. We have already discussed the effect of changing 
units in a new scale /(«). But it is also possible to change units in the original scale, in which 
the control u is described. If we use a unit that is c times smaller, then a control whose numeric 
value in the original scale was u, will now have the numeric value cu. For example, if we initially 
measured the flux of a substance (e.g., rocket fuel) into the reactor by kg/sec, we can now switch 
to lb/sec. 

Comment. There is no physical sense in changing the starting point for u, because we consider 
the control variable that takes only positive values, and so 0 is a fixed value, corresponding to the 
minimal possible control. 

We are looking for the universal rescaling method, that will be applicable to any reasonable 
situation (we do not want it to be adjustable to the situation, because the whole purpose of 
this rescaling is to avoid time-consuming adjustments). Suppose now that we first used kg/sec, 
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compared two different scaling functions /( ti) and /(«), and it turned out that f(u) is better (or, 
to be more precise, that the family $ = {A f(u) + z 0 } is better than the family $ = {A/(u) + a; 0 }) 
It sounds reasonable to expect that the relative quality of the two scaling functions should not 
depend on what units we used for u. So we expect that when we apply the same methods, but with 
the values of control expressed in lb/sec, then the results of applying f{u) will still be better than 
the results of applying /( u). But the result of applying the function f( u) to the control in lb/sec 
can be expressed in old units (kg/sec) as /(cu), where c is a ratio of these two units. So the result 
of applying the rescaling function /(u) to the data in new units (lb/sec) coincides with the result of 
appJying a new scaling function f c (u) = f(cu) to the control in old units (kg/sec). So we conclude 
that if /(u) is better than /(u), then f c (u ) must be better than / c (u), where f c {u ) = f(cu) and 

/c(«) = /(cu). This must be true for every c because we could use not only kg/sec or lb/sec but 
arbitrary units as well. ’ 

Now we are ready for the formal definitions. 

3. DEFINITIONS AND THE MAIN RESULT 

Definitions. By a rescaling function (or a rescaling for short), we mean a strictly monotonic 
function that maps the set of all positive real numbers (0,oo) onto the set of all real numbers 
(- 00 , + 00 ). We say that two rescalings f(u) and }'{u) are equivalent if f'(u) = Cf{u) + for 
some positive constant C and for some real number xq. 

Comment. As we have already mentioned, if we apply two equivalent rescalings, we will get two 
scales that are either both leading to a good control, or are both inadequate. 

By a family we mean the set of functions {Cf(u) + *<>}, where f(u) is a fixed rescaling, C runs 

over all positive real numbers, and x 0 runs over all real numbers. The set of all families will be 
denoted by S. 

A pair of relations (<,~) is called consistent [K90], [KK90], [KQ91] if it satisfies the following 
conditions: 

(1) if F < G and G < H then F < H; 

(2) F ~ F; 

(3) if F ~ G then G ~ F; 

(4) if F ~ G and G ~ If then F ~ H\ 

(5) if F < G and G ~ H then F < H; 

(6) if F ~ G and G < H then F < H; 

(7) if j F < G then it is not true that G < F or F ~ G. 

Assume a set A is given. Its elements will be called alternatives. By an optimality criterion 
we mean a consistent pair (<, ~) of relations on the set A of all alternatives. If G < F. we say that 
F is better than G; if F ~ G, we say that the alternatives F and G are equivalent with respect to 
this criterion. We say that an alternative F is optimal (or best) with respect to a criterion (<,~) 
if for every other alternative G either G < F or F ~ G. 

We say that a criterion is final if there exists an optimal alternative, and this optimal alternative 
is unique. 

Comment. In the present paper we consider optimality criteria on the set S of all families. 

Definitions. By a result of a unit change in a function f(u) to a unit that is c > 0 times smaller 
we mean a function f c {u) — f(cu). By the result of a unit change in a family $ by c > 0 we mean 
the set of all the functions that are obtained by this unit change from / £ $. This result will be 
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denoted by c$. We say that an optimality criterion on F is unit-invariant if for every two families 
$ and $ and for every number c > 0 the following two conditions are true: 

i) if $ is better than $ in the sense of this criterion (i.e., $ < $), then c$ < c$. 

ii) if $ is equivalent to in the sense of this criterion (i.e., $ ~ $), then c$ ~ c$. 

THEOREM. If a family $ is optimal in the sense of some optimality criterion that is final and 
unit-invariant, then every rescaling f(u) from $ is equivalent to f(u) = log(u). 

(Proof is given in Section 5). 

Comment . This means that the optimal rescalings are of the type 7 log(u) + o for some real 
numbers 7 > 0 and a. 

4. CASE STUDY: APPLICATION OF LOGARITHMIC RESCALING 
TO FUZZY CONTROL (BRIEF DESCRIPTION) 

Description of a plant. We design a control for chemical reaction within a constant volume, 
non-adiabatic, continuously stirred tank reactor (CSTR). The model that describes the CSTR is 
[M90]: 

ii = - x\ + Da{ 1 - x 1 )exp(x 2 /(l + Z2/7)) 

X 2 = -x 2 + BDa( 1 - xi)exp(x 2 /(l + x 2 fa )) - u(x 2 - x c ), 

where x\ is the conversion rate, x 2 is the dimensionless temperature, and u is the dimensionless 
heat transfer coefficient. The objective of the control is to stabilize the system (i.e., bring it closer 
to the equilibrium point). 

What we did. We applied a logarithmic rescaling x 2 -* X = logx 2 , and used membership 
functions with equal spacing for X . No further adjustment of membership functions was made. 

Results. Even without any further adjustment the results of this control were comparable to the 
results of applying the intelligent “gain scheduled” (non-linear) PID controller ([HK85], [M90]). In 
other words, we got the control that was as good as the one generated by the state-of-art traditional 
control theory with respect to stability and controllability of the plant. 

With respect to the computational complexity our fuzzy controller is much simpler. 

Rescaling is necessary. Without the rescaling, we got a fuzzy control whose quality was much 
worse than that of a PID controller. 

Details. The details of this case study were published in [VT92], 

5. PROOF OF THE MAIN RESULT 

The idea of this proof is as follows: first we prove that the optimal family is unit-invariant (in 
part 1 ), and from that, in part 2 , we conclude that any function / from $ satisfies a functional 
equation, whose solutions are known. 

1 . Let us first prove that the optimal family exists and is un/t-invariant in the sense that 
$ opt = c$ opt for all c > 0. Indeed, we assumed that the optimality criterion is final, therefore there 
exists a unique optimal family $ op t. Let’s now prove that this optimal family is unit-invariant (this 
proof is practically the same as in [K90], [KQ91], or [KQF92]). The fact that $ opt is optimal means 
that for every other either $ < $ opt or $ opt ~ If $ opt ~ $ for some $ / $ op *, then from the 
definition of the optimality criterion we can easily deduce that $ is also optimal, which contradicts 
the fact that there is only one optimal family. So for every $ either $ < $ opt or $ opt = 
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Take an arbitrary c and apply this conclusion to $ = c$ op/ . If c$ op< = $ < $ opt , then from 
the invariance of the optimality criterion (condition ii)) we conclude that $ op( < c -1 <I> op ,, and that 
conclusion contradicts the choice of $ opi as the optimal family. So $ = c$ op( < $ opt is impossible, 
and therefore $ opt = $, i.e., $ opt = c$ opt , and the optimal family is really unit-invariant. 

2. Let us now deduce the actual form of the functions f(u ) from the optimal family $ op t- 

If /(«) is such a function, then the result f{cu) of changing the unit of u to a c times smaller 

unit belongs to c$ opf , and so, due to 1., it belongs to $ opt . But by the definition of a family all 
its functions can be obtained from each other by a linear transformation Cf(u) + x 0 , therefore, 
/(cu) = Cf(u) + x 0 for some C and x 0 . These values C and x 0 depend on c. So we arrive at 
the following functional equation for f(u): /(cu) = C(c)/(u) + x 0 (c). In the survey on functional 
equations [A66] the solutions of this equation are not explicitly given, but a for a similar functional 
equation /(x + y) = f(x)h(y ) + k(y) all solutions are enumerated in Corollary 1 to Theorem 1, 
Section 3.1.2 of [A66]: they are f(x) = jx + a and f(x) = 7 exp(cx) + a, where 7 ^ 0, c ^ 0 and 
a are arbitrary constants. So, let us reduce our equation to the one with known solutions. 

The only difference between these two equations is that we have a product, and we need 
a sum. There is a well known way to reduce product to a sum: turn to logarithms, because 

log (ab) = log (a) + log(b). For simplicity let us use natural logatithms In. So let us introduce 

new variables X — ln(u) and Y = ln(c). In terms of these new variables x = exp(A'), c = 
exp(C). Substituting these values into our functional equation, and taking into consideration that 
exp(X)exp(Y ) = exp(X + Y), we conclude that F(X + Y) = H{Y)F{ X) + K(Y), where we 
denoted F(X) = f(exp(X)), H(Y) - C(exp(Y)), and K(Y) - x 0 (exp(Y)). So according to the 
above-cited result, either F(X) = 7X + a, or F(X) = 7 exp(cX) + a. 

From F(X) = /(exp (X)), we conclude that f(u) = F(ln (u)), therefore either f(u) = 7 In («)+ 
a, or f(u) = 7 exp(c In (u)) + a = 7 u c + a. In the second case the function f( u) maps (0, 00) onto 
the interval (a, 00), and we defined a rescaling as a function whose values run over all possible 
real numbers. So the second case is impossible, and /(x) = 7 ln(u) + a, which means that f(u) is 
equivalent to a logarithm. Q.E.D. 

6. CONCLUSIONS 

One of the important steps in designing a fuzzy control is the choice of the membership 
functions for all the terms that the experts use. This choice strongly influences the quality of the 
resulting control. 

For simple controlled systems, it is sufficient to have equally spaced membership functions, 
i.e., functions that have similar shape (usually triangular or trapezoid), and are located in intervals 
of equal length ..., [N - A, N + A], [ N , N + 2A], [N + A, N + 3A], ... 

For complicated systems this choice does not lead to a good fuzzy control, so it is necessary to 
tune the membership functions by applying neural networks or genetic algorithms. This is a very 
time-consuming procedure, and therefore, it is desirable to avoid it as much as possible. 

We consider the case, when the equally spaced membership functions are inadequate because 
the control variable u can take only positive values. Such situations occur, for example, when we 
control the flux of the substances into a chemical reactor (e.g., the flux of fuel into an engine). Our 
idea is to “rescale” this variable, i.e., to use a new variable u' = /(«), and to choose a function 
f(u) in such a way that we can apply membership functions, that are equally spaced in u' . 

We give a mathematical proof that the optimal rescaling is logarithmic (/(«) = a log(u) + 6). 
We also show on a real-life example of a non-linear chemical reactor that the resulting fuzzy cont rol, 
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without any further tuning of membership functions, can he comparable in quality with the best 
state-of-art non-linear controls of traditional control theory. 
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